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I.  INTRODUCTION 


Objective  analyses  of  meteorological  variables  have  always  de¬ 
pended  on  interpolation  schemes.  Some  methods  center  on  local  fits 
of  data  to  specific  grid  points,  where  subsequent  processing  removes 
inconsistencies.  Others  attempt  to  fit  given  observations  to  a  global 
network  by  spectral  representation  of  the  variables.  But  these  usually 
require  a  priori  interpolation  to  equally-spaced  points  in  order  to 
apply  the  spectral  integration.  Flattery  (1971)1,  for  example,  used 
Hough  functions  as  a  basis  for  his  analysis  in  the  meridional  direc¬ 
tion,  while  using  Fourier  expansions  in  the  zonal  direction  and  empir¬ 
ical  functions  in  the  vertical.  But  he,  too,  required  that  the  observed 
data  first  be  interpolated  to  fixed  volume  elements  before  the  spectral 
expansions  could  be  determined.  Once  the  spectral  coefficients  are 
known,  the  variables  can  be  interpolated  to  any  desired  location.  With 
finite  difference  models,  this  redundancy  of  interpolations  may  seem 
wasteful,  but  it  has  its  advantages,  especially  because  Hough  functions 
are  period-dependent  and  can  be  used  to  control  initial  imbalances. 

For  spectral  models,  spectral  coefficients  are  the  required  initial 
conditions,  but,  unless  Hough  functions  are  used  in  solving  the  model 
equations,  they  must  first  be  transformed  to  the  correct  basis.  This 
may  be  accomplished  through  relationships  between  the  Hough  functions 
and  the  desired  basis  or  through  numerical  integration  of  the  values 
represented  at  given  grid  points  as  with  the  finite  difference  models. 

In  either  case,  current  analysis  techniques  require  interpolation  of 

„  Observed  data.,  to  -equally- spaced  points,  a  step  which  may  alter  the 

■  *  t  2 

spectral  nature  o£  the  data,  as  pointed  out  by  Yang  and  Shapiro  (1973)  . 

i 

•  :  :  ■  ! 

•  *  l 

1.  Flattery,  T.'wl,  1971:  Spectral  models  for  global  analysis  and 
forecasting , *Proc.  Sixth  AWS  Tech.  Exchange  Conf . ,  U.  S.  Naval 
Academy,  AWS  T^ch  Rep  242,  42-53. 

2.  Yang,  C.-H.  and  R.  Shapiro,  1973:  The  effects  of  the  observa¬ 
tional  system  and  the  method  of  interpolation  on  the  computation 
of  spectra,  J.  Atmos.  Sci.  ,  30,  530-536. 


This  is  mainly  due  to  the  weights  imposed  during  interpolation.  Be¬ 
cause  of  distance  and  distribution  of  observation  points  vis-a-vis 
the  grid  points,  certain  data  will  be  unevenly  weighted.  One  would 
therefore  like  to  avoid  interpolation,  if  at  all  possible,  and  derive 
the  spectral  coefficients  directly  from  the  unevenly  spaced  data. 

This  work  will  examine  some  possible  procedures  for  accomplishing  this 
and  the  problems  related  to  each  of  the  methods. 

II.  ORTHOGONAL  EXPANSIONS 


Before  discussing  the  possible  methods  for  deriving  spectral 
coefficients  from  randomly  distributed  data  points,  it  may  be  useful 
to  review  some  of  the  properties  associated  with  expansions  in  ortho¬ 
gonal  bases.  A  set  of  functions  |<i>n(x)|  is  orthonormal  over  x  e[a,b] 


•'a 


k  k 

cj)  (x)  <J> ,  (x)  dx  =  6,  where  6.  =  1  when  j  =  k  and  =  0  otherwise. 
*  3  3  3 


There  are  certain  properties  associated  with  with  respect  to  an 

arbitrary  function  f(x)  in  [a,b]  that  are  important: 

1.  If  f(x)  is  bounded  and  continuous  at  all  but  a  finite  number 

of  points,  it  may  be  expressed  as  an  infinite  series  of  the  orthonormal 


functions,  i.e.,  f(x) 


...  r 


a^<{K  (x)  except  at  the  points  of  discontinuity. 


2.  The  coefficients  ja^,  jean  be  derived  by  implementing  the  ortho- 

*  b 

a 

For  any  k, 


normal  property  of 


w- 


<{>k  f  (x)dx. 


3.  The  coefficients 
any  truncation  0  <  M  <  °°.  Thus,  for  any 


/a.  >  satisfy  a  least-squares  criterion  for 

i  M  .  rb r  M  i2 


M,  /^(x)  -  £  a.O.T 
Ja  i=0 


dx 


will  be  a  minimum. 


When  dealing  with  a  set  of  discrete  data,  integrals  may  have 
to  be  approximated  in  order  to  employ  orthogonal  expansions.  When 
the  data  are  evenly  spaced,  approximations  to  integrals  can  become 
very  close,  depending  on  the  orthonormal  functions  in  question  and 
the  highest  degree  required.  Fourier  transforms,  for  example. 
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can  be  approximated  with  a  great  deal  of  precision  over  the  interval 
[-h,it}  if  sufficient  points  and  their  spacing  are  suitable  for  the  set 
of  functions  |cos  nx  ^  , jsin  nx  |  ,  or  |einX  If  the  orthogonal 
functions  are  polynomials ,  such  as  Legendre  polynomials  over  the  inter¬ 
val  [-l,l]  ,  transformations  of  polynomials  can  be  made  exact  by  intro¬ 
ducing  quadrature  formulae  with  sufficient  points  to  resolve  the  com¬ 
bined  highest  degrees  of  the  Legendre  polynomial  and  the  polynomial  in 
question.  When  the  data  are  not  equally  spaced,  or  not  spaced  with 
regard  to  the  particular  quadrature  formula  employed,  the  degree  of 
departure  will  depend  on  the  approximation  used. 


III.  THREE  METHODS 


In  deriving  the  coefficients  for  an  orthogonal  expansion  when 
the  data  points  are  unevenly  spaced,  one  must  first  specify  the  cri¬ 
teria  that  one  seeks  to  satisfy  before  solving  for  the  coefficients. 

One  may,  for  instance,  wish  that  the  series  satisfy  a  least-squares 
fit,  or  one  may  desire  that  the  coefficients  decrease  in  magnitude 
with  increasing  wavenumber.  For  this  study,  three  methods  were  se¬ 
lected  for  comparison  with  a  known  function  consisting  of  a  finite 
cosine  series  of  K  terms  with  the  coefficients  equal  to  (-l)n(n+l)  ^ 
where  n  is  the  wavenumber.  K  is  a  parameter  that  can  be  varied. 

The  generating  function,  f (x) ,  is,  thus,  equal  to 
K  i  -2 

V  (-l)J(j  +  l)  cos  2iTjx.  There  are  N  values  of  x,  selected  between 

j=0 

0  and  0.5,  where  N  is,  again,  a  parameter  of  the  problem.  The  approx¬ 
imating  series  is  assumed  to  be  truncated  at  M  terms.  The  three  methods 
are: 


1.  Derivation  of  the  coefficients  by  a  direct  least-squares 

approach.  M  +  1  independent  linear  equations  are  derived  for  the  coef- 
^  N  ,  M  _  ,  ~ 1 2  , 


ficients,  a^,  by  requiring  that 


n  r,  .  m 

x  f(V  -  x 

1=1  j= 


-  23  a.  cos  2irjx^|  be  a 


minimum.  The  coefficient  matrix  is  symmetric  with  general  term 


.  =  Y'  cos  27rjx.  •  cos  27rkx.. 
3  i=l 
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2.  Derivation  of  empirical  orthogonal  functions  (EOF’s) .  One 

chooses  a  set  of  functions  such  that  they  satisfy  the  conditions 

of  orthonormality  over  the  data  points,  i.e. , 

N 

1/N  £  Vx.)Pk<xJ  =  <S.  •  (2) 

i=l  D  K  1  ] 

These  functions  must  be  recalculated  for  each  new  distribution  of 
data  points.  The  nature  of  the  functions  is  also  arbitrary.  Or¬ 
dinarily  they  are  chosen  as  polynomials  with  coefficients  derived 
by  the  Gram-Schmidt  procedure.  For  purposes  of  comparison  in  this 
study,  the  functions  were  chosen  as  linear  combinations  of  cosines. 
That  is, 

V*’  ' 

The  Gram-Schmidt  method  is  just  as  applicable  here  as  with  poly¬ 
nomials.  f (x)  can  then  be  approximated  by  the  series 
M 

X  |3  p  (x)  ,  where  the  coefficient  S,  is  given  by  virtue  of  the 
n=0  n  n  k 

orthonormality  condition,  namely 

N 

Bk  =  1/N  £  f(x.)  PrU^.  (4) 

Combinations  of  the  (3^’s  and  b_.’s  should  yield  the  equivalent  of 
the  a^'s  in  Method  1  for  cross  comparison. 

3.  Deriving  the  coefficients  one-by-one.  Were  the  cosine 
functions  truly  orthogonal  under  a  summation  over  the  randomly- 
spaced  points,  the  coefficients  could  be  derived  singly  and  would 
still  satisfy  the  least-squares  requirement.  Because  they  are  not 
orthogonal  for  the  unequally-spaced  points,  however,  coefficients 
derived  singly  will  not  satisfy  the  least-squares  requirement  for  the 
entire  series,  but  they  can  be  restricted  so  that  they  will  event¬ 
ually  decrease  in  magnitude  with  increasing  wavenumber.  This  cri¬ 
terion  dictates  that  low  wavenumbers  be  made  representative  of  the 
function’s  large  scale,  while  high  wavenumbers  represent  the  unfil¬ 
tered  smaller  waves.  To  achieve  this,  one  can  simply  sequentially 
subtract  the  longer  waves  leaving  only  the  higher  wavenumbers.  A 


k 

£  b.  cos  27rjx.  (3) 

j=0  3 
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particular  wavenumber  is  then  filtered  out  by  multiplication  by  the 
orthonormal  function  of  the  desired  wavelength  and  adding  over  all 
points.  Were  the  points  correctly  spaced  or  the  orthonormal  function 
truly  orthonormal,  the  filter  would  work  perfectly.  Because  of  uneven 
spacing,  however,  the  method  captures  more  waves  than  intended.  There 
is  a  guarantee,  however,  that  the  coefficients  of  higher  order  wave¬ 
number  will  ultimately  represent  waves  of  decreasing  length.  This 
aspect  has  been  termed  the  "finality"  condition  by  Holmstrom  (1963)^ 
who  imposed  it  on  the  creation  of  his  EOF's  for  analyzing  geopotential 
heights.  It,  in  effect,  guarantees  convergence  of  the  series  through¬ 
out  the  interval. 

Accordingly,  the  first  order  coefficient,  aQ,  should  represent 

N 

the  mean  of  the  function,  1/N  y  f(x.).  The  second  coefficient,  a., 

i=l  1  1 


represents  the  departure  from  the  mean  filtered  by  cos  2itx^  and  nor- 


N 

malized  by  ^ 

2 

cos 

2rrx. . 

i 

i- 1 

II 

•H 

1 

N 

- 

- 

/  N  2  \ 

Thus  a  =  V 

1  i=l 

f  (X.) 

1 

_ 

-  ao 

cos  2irx.  1  )  cos  2ttx  .  ] 

1  \i=l  1  ) 

is  continued  so  that 

any 

coefficient  is  given  by 

N 

L 

i=l 


k-1 


f(x 


i)  -  S  a 


COS  2ujx. 


j=0 


COS  2TTkx 


/  N  2  V1 

.  I  y]  COS  2TTkX.  J 

1  \i=i  V 


IV .  TESTS 


A  series  of  tests  was  made  with  regards  to  distribution,  number, 
and  error.  The  tests  can  best  be  understood  by  a  description  of  the 
label  connected  with  any  given  run.  A  typical  run  was  described  by  a 


3.  Holmstrom,  I.,  1963:  On  a  method  for  parametric  representation 
of  the  state  of  the  atmosphere,  Tellus ,  15,  127-149. 
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(5) 
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label  which  looked  like  d,  N=n,  M  =  m,  K=k,  R.  Capital  letters 
are  fixed,  lower  case  letters  are  variable. 


d  -  Distribution.  Three  types  of  distribution  were  tested. 
Evenly  spaced  data  points,  d  =  E,  implies  that  the  N  data  points  were 
placed  in  positions  which  would  enable  fast  Fourier  transforms,  i.e. , 

=  ( 2 j  -1)  (4N)  1 ,  j  =  1, . . N .  Random  placing  of  the  data  was 

performed  uniformly,  d  ^  U,  or  by  staggering  the  data  points,  d  =  S, 
so  that  90  percent  of  the  points  lay  in  the  interval  0  <  x  <  0.25 
and  10  percent  in  the  remaining  half. 

N,M,K  -  Number.  N  refers  to  the  number  of  data  points  (n) , 
assumed  known.  M  refers  to  the  truncation  limit  (m)  on  the  approxi¬ 
mating  series,  while  K  refers  to  the  upper  limit  (k)  on  the  generat¬ 
ing  function.  The  default  values  for  m  and  k  are  15,  when  not  spec¬ 
ified  on  the  label. 

R  -  Random  Error.  When  the  label  "R"  appears,  "observational" 
errors  were  added  to  the  data  before  the  coefficients  were  determined. 
The  error,  e,  was  uniform  random  and  of  magnitude  Z  0.01  (as  compared 
with  a  maximum  coefficient  magnitude  of  1.0) . 


Thus  a  label  U,  N  =  50,  R  refers  to  a  test  where  50  data  points 
were  uniformly  distributed  between  0  and  0.5,  15  terms  were  used  for 
both  the  generating  and  approximating  series,  and  an  error  of  at  most 
±0.01  was  added  to  each  data  point.  A  label  S,  N  =  50,  M  =  10  implies 
a  staggered  distribution  of  the  50  points,  15  terms  in  the  generating 
function  and  only  10  terms  in  the  approximating  series. 


V.  RESULTS 


Although  Methods  1  and  2  differ  procedurally ,  they  yield  the 
same  results.  The  reason  for  this  is  that  the  EOF's  are  linear  com¬ 
binations  and  of  the  same  degree  as  the  cosine  series.  Because  of 
this,  they  both  span  the  same  vector  space  and  will  give  the  same 
approximation  to  any  function.  In  general  we  can  prove  the  following 
theorem: 

Given  wo  sets  of  functions  |Q  (x)  J  and  j  <p^  (x)  |  ,  integrable 
c,  ra,h'  <nere  Qn(x)  =  b£,nHtx* '  b£.n  ^  °'  and  also  a  set  of 
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coefficients 


|Bn£  such  that  f  if  (x)  -  Yh 

*4  [_  n=0  I 


dx  is  a  minimum 


for  a  given  integrable  function  f(x),  then  the  coefficients  ja^  |  , 

M  M 

where  ^  a  6  (x)  =  ^  B  Q  (x)  are  exactly  those  which  will  minimize 
n=0  n=0  n  n 

/-bj"  M  *12 

J  f(x)  "  E  Vn(x)  dX* 

“'a  j_  n=0  J 

Proof:  Since  the  coefficients  3^,  j  =  minimize  the 

integral  of  the  squared  difference,  we  have  by  differentiation  with 
respect  to  each  3^  the  condition  f*3  J  f  (x)  -  ^  (x)  I  Q^(x)dx  =  o, 

a  L  3=°  J 

k  =  0, . ,M.  Substituting  for  Q,  and  Q.,  we  get 

v  *3 


fbL  u 

•4  L 


)  -  Y  "A <x) 


j=0 


3  3 


Y  t>£jcl^£^x^x  =  0  for  k  =  Since 


condition 


|$k(x)dx  =  0.  But  this  is  simply  the 


b£k  4  0/  we  can  eliroinate  each  term  of  the  series  sequentially  by 

starting  with  k  =  0  and  working  up  to  k  =  M.  This  leaves  us  with  the 

M  *1 

|f(x)  -  Y  pi 

3=0  J  3  J 

necessary  condition  for  proving  that  the  coefficients  a  ,  j  =  0, . ,M, 

minimize  the  squared  difference  between  the  f(x)  and  the  series  in  terms 
of  the  's, 


ft 


Because  the  two  methods  differ  computationally,  however,  they 
will  yield  different  results  when  approximate  solutions  are  either 
very  close  to  the  true  solution  or  very  far.  Table  1  exemplifies 
this  feature,  in  giving  the  root  mean  square  (rms)  error  between  coef¬ 
ficients  of  the  approximating  series  and  the  true  coefficients  for 
equally  spaced  points.  Without  the  addition  of  random  error  to  the 
data,  rms  errors  for  all  three  methods  are  within  the  noise  level  of 
the  computer  algorithms.  The  addition  of  random  error  to  the  data 
affects  all  methods  equally;  rms  errors  for  fewer  points  are  only 
slightly  greater  than  for  N  =  30.  Truncation  to  M  =  10  has  no  detri¬ 
mental  effect  even  in  the  face  of  random  error. 
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TABLE  1.  RMS  Differences  between  Coefficients  of  the  Generat¬ 
ing  Series  and  the  Coefficients  of  the  Approximating  Series  for  Uni¬ 
form  Distribution. 


Run 

Method  1 

Method  2 

Method  3 

E,N=30 

2 . 1543xl0~14 

-14 

2.5977x10 

1.9510xl0-14 

E,N=16 

2.0282xl0-14 

2.5152xl0”14 

1 . 4059xl0~14 

E ,N=30 , M=10 

2.5182xl0-14 

2.6'247xl0~14 

2.3R45xl0"14 

E ,N=16 ,M=10 

1.7838xl0-14 

-14 

2.1707x10 

1.6992xl0-14 

E,N=16,M-8 

1.8979xl0“14 

2 . 4168xl0~14 

1.8973xl0-14 

E,N=30,R 

1. 1936xl0“3 

1.1936xl0"3 

1 . 1936xl0~3 

E,N=30,M=10,R 

1.2822xl0~3 

1.2822xl0~3 

1.2822xl0_3 

Figures 

la-b  show  the  effect  of 

number  on  U  and  S  distributions 

respectively , 

for  all  three  methods. 

Methods  1  and 

2  do  not  coincide 

for  these  cases  because  they  are  still  within  the  noise  range  of  the 
algorithm,  even  for  S  distribution  where  the  rms  error  rises  rapidly 
with  decreasing  N.  It  is  obvious  from  these  figures  that  while 
Methods  1  and  2  are  better  than  Method  3  for  both  distributions,  at 
least  for  N  =  30,  they  become  highly  unreliable  for  decreasing  N. 
Method  3  is  barely  affected  by  number  and  maintains  approximately 
the  same  estimates  of  coefficients  even  when  N  approaches  the  criti¬ 
cal  minimum  of  15  points  for  resolving  the  15  cosine  waves.  Note 
also  that  with  U  distribution,  increasing  the  number  of  points  beyond 
30  has  very  little  impact  on  error  for  any  of  the  methods.  For  S 
distributions,  however,  there  is  a  distinct  effect  caused  by  vary¬ 
ing  the  number  of  points. 
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Figure  la.  RMS  difference  between  coefficients  of  the  approxi¬ 
mating  series  and  the  generating  series  as  a  function  of  number 
of  data  points  for  all  three  methods  and  for  U  distribution. 
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Figure  lb.  RMS  difference  between  coefficients  of  the  approxi¬ 
mating  series  and  the  generating  series  as  a  function  of  number 
of  data  points  for  all  three  methods  and  for  S  distribution. 
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Figures  2a-b  depict  the  effects  of  truncation  (M  =  10)  on  the 


rms  error.  Here,  error  levels  are  substantially  higher  than  for 
M  =  15.  Oddly,  the  rms  error  for  all  methods  increases  drastically 
for  N  15  for  U  distribution,  while  for  S  distribution,  only  Methods 
1  and  2  cause  a  precipitous  rise  in  rms  error  as  N  drops. 

Table  2  lists  the  rms  errors  for  runs  including  the  addition 
of  random  error  to  the  data.  Method  3  fares  much  better  here  than 
in  previous  comparisons.  In  general.  Methods  1  and  2  seem  superior 
in  cases  when  the  ratio  of  N/M  is  high,  while  Method  3  prevails  when 
the  ratio  is  low,  or  when  the  S  distribution  is  encountered.  In 
fact,  for  S  distribution  and  N  =  16,  Methods  1  and  2  fail  to  achieve 
credible  approximations  of  the  coefficients. 


TABLE  2.  RMS  Differences  between  Coefficients  of  the  Generat¬ 
ing  Series  and  the  Approximating  Series  for  Runs  Involving  the  Addi¬ 
tion  of  Random  Error  to  the  Data  (Other  than  E  Distribution  Portrayed 
in  Table  1) . 


Run 

Method  1 

Method  2 

Method  3 

U,N=45,R 

1.9280x10" 3 

1.9280xlo"3 

6. 9638xl0~3 

U,N=16,R 

5.4372 

5.4372 

2.0119xl0~2 

S,N=45,R 

7.1964xl0_1 

7 . 1964xl0_1 

4.5070xl0~2 

S,N=16,R 

Failed 

Failed 

3 . 8026xlo"2 

U,N=30,M=10,R 

3. 1442xlo"3 

3 . 1441xl0~3 

8. 7045xl0~3 

S,N=30,M=10,R 

8 . 0520xlo"3 

8. 0520xl0~3 

4. 2164xl0~2 

S,N=16,K=20,R 

Failed 

Failed 

3.8103xlo"2 

-2 
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Figure  2a.  RMS  difference  between  coefficients  of  the  approxi¬ 
mating  series  and  the  generating  series  as  a  function  of  number 
of  data  points  for  all  three  methods  and  for  U  distribution.  M 
is  truncated  at  wavenumber  10. 
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Figure  2b.  RMS  difference  between  coefficients  of  the  approximating 
series  and  the  generating  series  as  a  function  of  number  of  data 
points  for  all  three  methods  and  for  S  distribution.  M  is  truncated 
at  wavenumber  10. 


The  spectra  shown  in  Figures  3-5  demonstrate  the  effect,  by 
wavenumber,  of  the  various  errors.  Figure  1  depicts  the  spectra  ot 
the  generating  funct ions  along  with  the  appro xi mat i ng  function  for 
H,  N  U' ,  M  10,  K.  By  Table  1,  all  methods  coincide  for  this 
case  and  so  only  one  approximating  spectrum  is  depict'd.  Notice 
that  the  ettect  ot  truncation  and  the  addil  ion  ot  r  i adorn  errot  seem 
only  to  pert  mb  the  high  wavenumbers  Ik  -  ')  .  Figures  4a-c  show 
the  effects  of  i .union)  distribution  in  comparison  with  Figure  In 

Figure  4a,  differences  in  the  methods  appeal  toi  lower  wavunumbets, 
but,  despite  the  piesenco  ot  truncation  at  M  10,  departures  from 
true  values  are  not  systematic  or  severe.  The  addition  ot  random 
error  to  the  data  does  not  result  in  any  major  spectral  changes  as 
seen  in  Figure  4b.  The  roduet ion  of  data  (Mints,  however,  as  de¬ 
picted  by  Figure  4c,  has  a  pronounced  effect  on  the  spectra,  result¬ 
ing  in  a  substantial  underestimate  of  values  t or  wavenumbers  higher 
than  *_>.  Figures  5a-c  are  similai  to  Figures  4a-c,  except  that  they 
are  for  S  distribution.  Here  the  departures  are  more  severe  than 
tor  0  distribution,  while,  again,  the  addition  of  random  error  has 
very  little  impact .  The  reduction  ot  data,  (Mints,  however,  does 
not  result  in  a  systematic  undi  i est  imate  ot  the  spectra  as  with  U 
distribution.  it  anything,  there  seems  t o  be  an  overost imat ion  ot 
the  spectral  coefficients  in  this  case.  It  is  also  impoitant  to 
not  ice  that  the  departure  ot  Method  >  is  more  pronounced  tor  higher 
wavenumbers  than  Methods  1  and  ,  but  that  they  are  both  as  accur¬ 
ate  for  lower  wavenumbers . 

IK'piction  ot  the  function  itself  is  found  in  Figures  t>  and  7. 
Figures  i>a-b  are  tot  Methods  L  and  2  and  Method  t,  respect ively, 
tor  N  -  to,  M  10.  both  U  and  F  distributions  are  shown  on  each 
graph.  Here,  one  can  clearly  see  the  superioiity  ot  Methods  1  and 
2  over  Method  1  as  far  as  accuracy  is  concerned.  When  N  drops  to 
Hi  as  in  Figures  7u-b,  this  super  lot  it y ,  especially  with  regards  to 
the  s  distribution,  is  non-existent.  Not  ice  that  Methods  1  and  2 
still  guarantee  a  least -squares  lit  to  the  data.  Fven  tor  F,  N  -  lb, 
M  10  (Figute  7a)  where  very  large  diserepaneies  appear .  the  approx- 


Figure  3.  Spectra  of  the  approximating  series  for  all 
three  methods  and  for  the  generating  series  for  E, 

N  =  30,  M  *  10,  R. 


18 


Figure  4a.  Spectra  for  the  generating  series,  Methods 
1  and  2,  and  Method  3  for  U,  N  =  30,  M  =  10. 
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Figure  4b,  Sp>ctra  for  the  generating  series.  Methods 
1  and  2,  and  Method  3  for  U,  N  =  30,  M  =  10,  R. 
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Figure  5a.  Spectra  for  the  generating  series,  Methods 
1  and  2,  and  Method  3  for  S,  N  =  30,  M  =  10. 
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Figure  5b.  Spectra  for  the  generating  series,  Methods 
1  and  2,  and  Method  3  for  S,  N  =  30,  M  =  10,  R. 
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Figure  5c.  Spectra  for  the  generating  series,  Methods 
1  and  2,  and  Method  3  for  S,  N  =  16,  M  =  10. 
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Methods  I  and  2 


e  6a.  Values  of  the  generating  series  and  the  approximating 
s  for  both  U  and  S  distribution,  N  =  30 ,  M  =  10 ,  as  a  function 
for  Methods  1  and  2. 
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Figure  7a.  Values  of  the  generating  series  and  the  approximating 
series  for  both  U  and  S  distribution,  N  =  16,  M  =  10,  as  a  function 
of  x  for  Methods  1  and  2. 
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Figure  7b.  Values  of  the  generating  series  and  the  approximating 
series  for  both  U  and  S  distribution,  N  =  16,  M  =  10,  as  a  func¬ 
tion  of  x  for  Method  3. 


imatirtg  function  still  minimizes  the  squared  error  with  the  qiven 
data.  Apparently,  however,  the  distribution  of  the  data  points  left 
out  the  interval  between,  roughly,  0.25  and  0.45,  allowing  the  func¬ 
tion  to  roam  freely  between  these  values.  Method  3  may  not  fit  the 
data  very  well  in  places  but,  because  of  its  finality  condition,  pre¬ 
vents  runaway  errors . 

VI.  SUMMARY  AND  CONCLUSION 

Three  methods  have  been  studied  which  yield  coefficients  of 
a  finite  cosine  series  in  an  interval,  given  a  sampling  of  data  Ln 
that  interval.  The  first  method  directly  solved  the  least-squares 
condition  by  matrix  inversion.  The  second  involved  the  calculation 
of  EOF's,  while  the  third  imposed  the  condition  of  finality  and  the 
coefficients  were  derived  singly.  It  was  shown  that  since  the  EOF's 
were  specified  as  linear  combinations  of  cosines,  the  first  two  meth¬ 
ods  would  yield  the  same  results. 

The  following  conclusions  can  be  drawn  from  examining  the  rms 
errors  and  the  spectra  of  the  various  approximating  series  produced 
by  the  methods: 

1.  Number  is  less  of  a  factor  than  distribution.  Even  when 
the  number  of  points  approaches  the  critical  value,  if  they  are 
equally  spaced,  they  will  result  in  perfect  approximation.  Uniformly 
distributed  data  will  generally  result  in  closer  approximation  than 
non-uni formly  distributed  data. 

2.  Method  3  does  not  provide  as  close  a  fit  as  Methods  1  and  2 
but  does  not  result  in  divergent  behavior  for  poorly  distributed  data. 
The  divergent  behavior  of  Methods  1  and  2  is  due  to  the  fit  of  the 
series  with  the  data  given,  which  allows  significant  degrees  of  free¬ 
dom  for  the  intervals  not  covered  by  data.  This  is  also  related  to 
the  "ill-conditioned"  nature  of  the  coefficient  matrix  which  must  be 
inverted  in  Method  1,  as  will  be  discussed  below. 

3.  The  addition  of  random  error  to  the  data  has  very  little 
impact  on  the  results.  At  least  for  the  magnitude  of  error  tested. 


the  deviations  from  unperturbed  cases  are  mimimal.  Since  the  errors 
are  uniformly  random,  the  averaging  process  over  the  data  points  acts 
as  a  filter  and  diminishes  the  effect  of  these  observation  errors. 

4.  Truncation  error  has  a  nominal  effect  on  the  coefficients 
especially  in  the  high  wavenumber  regime.  In  many  cases,  when  data 
points  are  few,  it  is  preferable  to  truncate  rather  than  to  go  to 
higher  wavenumbers.  This  is  because  solution  of  Methods  1  and  2 
makes  the  wavenumbers  interdependent,  so  that  the  truncation  is  "felt" 
in  all  coefficients.  In  Method  3,  however,  there  is  no  danger  of  this 
happening  and  the  truncation  limit  is  arbitrary. 

From  analysis  of  the  spectra  in  Figures  4  and  5,  one  notes  that 
Method  3,  although  designed  to  depict  less  of  the  function  for  increas¬ 
ing  wavenumber,  does  not  result  in  monotonically  decreasing  coefficients 
with  increasing  wavenumber.  The  reason  for  this,  as  stated  above,  is 
that,  because  of  non-uniform  distribution,  the  approximation  of  indi¬ 
vidual  coefficients  is  not  precise.  Thus  when  subtraction  of  these 
waves  from  the  oriqinal  data  is  performed,  the  remainder  will  not  be 
accurate,  leading  to  more  impreciseness.  The  method  assures,  however, 
that  the  approximation  will  eventually  catch  up  and  adjust  itself  to 
the  data,  without  diverging. 

One  may  question,  however,  whether  numerical  integration  is  per¬ 
haps  superior  to  Method  3  for  approximating  the  coefficients.  That 
is,  one  may  attempt  to  derive  a^  by  numerically  integrating 

f  S 

2  I  f  (x)  cos  2Trkx  dx  by  quadrature  formulae.  Unless  the  data  points 

*U 

are  correctly  spaced  for  the  relevant  quadrature  formula,  numerical 
integration  will  involve  unwittingly  weighting  certain  data  points. 

This  has  the  same  effect  as  interpolation  of  data  points  to  fixed  grid 
points  before  integration.  As  pointed  out  previously,  such  a  pro¬ 
cedure  could  affect  the  spectral  nature  of  the  data.  As  an  example, 
Table  3  indicates  the  rms  differences  between  the  coefficients  of  the 
generating  function  and  coefficients  derived  from  the  trapezoidal  and 
midpoint  rule  of  integration.  These  results  may  be  compared  with 
those  found  in  Tables  I  and  2  and  with  Figures  1  and  2.  The  trapezoi- 


30 


dal  rule,  being  a  closed  Newton-Cotes  quadrature  formula,  required 
inserting  the  end  values  of  the  generating  function  for  each  run. 

Note  that  for  E  distribution  the  midpoint  quadrature  is  exact,  while 
the  trapezoidal  rule  is  approximate.  In  general,  rms  differences 
for  numerical  integration  are  significantly  higher  than  for  Method 
3  (in  some  cases  by  an  order  of  magnitude) .  The  sensitivity  to  num¬ 
ber  and  distribution  of  data  points  is  also  obvious,  when  one  notes 
the  significant  increase  in  error  for  S  distribution  as  opposed  to 
U  distribution  and  the  jump  in  error  caused  by  a  reduction  in  number. 
Method  3,  which  works  on  wave  space  rather  than  physical  space,  is 
better  suited  to  represent  the  spectral  nature  of  the  particular 
function  for  these  distributions,  while  numerical  integration  by 
concentrating  on  physical  space  is  not  as  efficient. 


TABLE  3.  RMS  Differences  between  Coefficients  of  the  Generat¬ 
ing  Series  and  the  Approximating  Series  for  Runs  Involving  Trape¬ 
zoidal  and  Mid-Point  Quadrature  Integrations. 


Run 

Trapezoidal 

Mid-point 

E,N=30 

3. 39793x10” 2 

2.152034xl0*14 

E,N=30,  M=10 

3 . 27108xlo”2 

2.551610xl0-14 

U,  N=30 

8 . 3108xl0~2 

7 . 64284xl0~2 

U,  N=100,  M=10 

1 . 13156xl0~2 

3.97745xl0-3 

U,  N=30,  M=10 

5.55853xl0"2 

3.44025xl0”2 

U,  N=30,  M=10 ,  R 

5. 57984xlo”2 

3. 44947xl0~2 

U,  N=16 ,  M=10 

1.25008xl0_1 

1.07500xl0_1 

S,  N=100,  M=10 

2.96793xl0_1 

2.96183xl0_1 

S,  N=30,  M=10 

3.92814X10-1 

4.07799xl0_1 

S,  N=16 

4.91530xl0-1 

5.23083xl0-1 
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Methods  1  and  2  are  apparently  not  based  on  physical  spacing 
either  but  on  finding  a  function  which  approaches  most  closely  the 
available  data.  As  such,  the  spacing  affects  the  choice  of  coeffi¬ 
cients  by  prescribing  where  along  the  x-axis  the  approximating  func¬ 
tion  must  approach  the  true  solution.  But  the  function  is  free  to 
roam  through  other  values  of  x  where  data  are  not  available.  The 
matrix  of  Method  1  can  sometimes  serve  as  an  indicator  as  to  when 
Methods  1  and  2  will  fail.  By  examining  the  ratio  of  the  eigenvalues 
of  the  coefficient  matrix  in  the  least-squares  problem,  one  can  ascer¬ 
tain  the  amount  of  distortion  involved  in  the  mapping  of  the  solution 
vector  to  the  given  vector  containing  the  data  points.  If  the  ratio 
of  eigenvalues  is  very  large,  it  is  an  indication  that  a  great  deal  of 
magnification  and  diminution  will  be  present  in  the  mapping,  leading  to 
misrepresentation.  Unfortunately,  this  is  not  a  necessary  condition 
for  the  solution  to  diverge  from  the  true  solution.  In  the 
problem  considered  here,  the  matrix  is  real  symmetric  with  entries 


c^_.  given  by 


(1)  . 


The  eigenvalues  are  therefore  always  real.  The 


ideal  ratio  of  extreme  eigenvalues  y  (i.e.,  for  E  distribution)  is 


2,  because  the  elements  of  the  matrix  are  simply  c^_.  =  6^  N/2,  ex¬ 
cept  when  k  -  j  =  0,  where  cqo  =  N.  For  S,  N  =  16,  y  =  -2.126  x  10 
a  clear  indication  as  to  why  the  method  failed  for  that  run.  Yet 


for  U,  N  =  30,  where  the  fit  was  very  close  (according  to  rms  differ- 

2 

ences  in  the  coefficients),  y  =  1.681  x  10  ,  but  for  U,  N  =  100, 

M  =  10,  where  the  fit  was  close,  but  not  as  close  as  U ,  N  =  30,  y  = 

3.343.  For  U,  N  =  30,  M  =  10,  R,  where  the  fit  is  only  slightly 

worse  than  U,  N  =  100,  M  =  10,  y  =  5.442.  S ,  N  =  30,  M  =  10,  R 

yields  results  which  are  only  slightly  worse  than  for  the  U  distri- 

2 

bution,  but  its  y  =  9.463  x  10  .  Thus,  other  criteria  must  be  sought 
to  enable  evaluation  of  Methods  1  and  2  and  their  applicability. 


In  meteorological  analysis  one  often  encounters  areas  of  sparse 
data  similar  to  the  S  distribution  in  this  study.  Results  here  in¬ 
dicate  that  attempts  to  analyze  data  according  to  Methods  1  and  2 
may  produce  runaway  errors  in  regions  of  no  coverage.  Method  3,  on 
the  other  hand,  while  being  a  safe  method,  could  lead  to  serious  in¬ 
accuracies.  It  may  be  best  to  combine  methods  by  allowing  Method  3 
to  provide  a  first  guess  in  sparse  areas  which  cam  then  be  used  in 


conjunction  with  the  given  data  in  data-dense  areas  to  provide  analyses 
according  to  Methods  1  and  2.  The  fact  that  Methods  1  and  2  will  both 
provide  similar  coefficients  will  allow  a  pragmatic  evaluation  of  the 
employability  of  each  method  for  any  given  problem.  For  numerous  coeffi¬ 
cients  the  inversion  of  large  matrices  may  be  alleviated  by  employing 
Method  2.  In  other  cases  a  simple  matrix  inversion  may  prove  easier  than 
the  two-step  EOF  method. 

This  study  was  limited  to  one  dimension.  The  introduction  of  a 
second  dimension  complicates  the  application  of  the  three  methods. 

If  points  are  randomly  distributed  in  two-dimensional  space,  it  is  not 
obvious  how  one  can  represent  a  sum  of  functions  to  approximate  the  data. 
With  evenly  distributed  data,  each  dimension  is  considered  separately, 
but  this  is  not  feasible  if  points  are  not  aligned  in  any  specific  di¬ 
rection.  Future  research  will  attempt  to  solve  these  fundamental  prob¬ 
lems  and  extend  the  results  obtained  here  to  higher  dimensions. 


